Improving Processor Performance Through Compiler-Assisted Block Reuse

نویسنده

  • Jian Huang
چکیده

Superscalar microprocessors currently power the majority of computing machines. These processors are capable of executing multiple independent instructions in each clock cycle by exploiting the Instruction-Level Parallelism (ILP) available in programs. Theoretically, there is a considerable amount of ILP available in most programs. However, the actual amount of exploitable ILP within a fixed instruction window with preset hardware resources is typically quite limited. On the other hand, researchers have observed that the values produced by the execution of instructions exhibit considerable value locality, that is, the repetitive execution of a single instruction often produces repetitive values. Hence, while its performance is limited by the exploitable ILP, the processor is doing redundant work. A natural solution to this problem would be to remove as much redundant work as possible. Value prediction and value reuse are two of the promising approaches addressing this issue. Value prediction does not actually remove the redundant work. Instead, it improves the available ILP by allowing dependent instructions to be executed speculatively after predicting the values of their operands. Value reuse, on the other hand, tries to remove the redundancy by buffering the previously produced results of instructions and skipping the execution of redundant instructions. This thesis focuses on value reuse schemes. Previous value reuse mechanisms use only a single instruction as the reuse unit, i.e., only one instruction is skipped for each reuse-detection process. This research, however, shows that value ii reuse at larger granularities than a single instruction could potentially further improve the performance of superscalar processors by skipping the execution of a number of instructions for each reuse-detection process. Basic-block reuse, sub-block reuse, trace reuse and function reuse schemes are studied in detail. These schemes cover the full spectrum of value reuse granularity from a single instruction to an entire function. Simulation results show that block reuse with compiler assistance has substantial potential to improve the performance of superscalar processors. In particular, it is shown that a block of instructions, such as a basic block or a sub-block, behaves like a super-instruction that exhibits a substantial amount of value locality. Basic blocks and sub-blocks provide a convenient link between the processor hardware and the compiler, allowing the compiler to influence and to help improve the performance of block reuse with only a reasonable amount of hardware. Finally, it is shown that block reuse outperforms reuse mechanisms based on other reuse units. This thesis makes three primary contributions. Firstly, it extends the value locality concept to a level beyond the single-instruction by studying the value behavior of instruction blocks and evaluating the feasibility of block reuse. Secondly, it develops a microarchitecture for block reuse and evaluates the performance of block reuse using a realistic processor model. Finally, it develops a compiler algorithm to intelligently slice the basic blocks into sub-blocks to expose more value reuse opportunity than hardware-only mechanisms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extending Value Reuse to Basic Blocks with Compiler Support

Speculative execution and instruction reuse are two important strategies that have been investigated for improving processor performance. Value prediction at the instruction level has been introduced to allow even more aggressive speculation and reuse than previous techniques. This study suggests that using compiler support to extend value reuse to a coarser granularity than a single instructio...

متن کامل

Compiler-Assisted Sub-Block Reuse

The fact that instructions in programs often produce repetitive results has motivated researchers to explore various alternatives to exploit this value locality, such as value prediction and value reuse. Value prediction improves the available Instruction-Level Parallelism (ILP) by allowing dependent instructions to be executed speculatively after predicting the values of their operands. Value ...

متن کامل

Compiler-assisted Hybrid Operand Communication

Communication of operands among in-flight instructions can be power intensive, especially in superscalar processors where all result tags are broadcast to a small number of consumers through a multi-entry CAM. Token-based point-to-point communication of operands in dataflow architectures is highly efficient when each produced token has only one consumer, but inefficient when there are many cons...

متن کامل

Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse

The fact that instructions in programs often produce repetitive results has motivated researchers to explore various techniques, such as value prediction and value reuse, to exploit this behavior. Value prediction improves the available Instruction-Level Parallelism (ILP) in superscalar processors by allowing dependent instructions to be executed speculatively after predicting the values of the...

متن کامل

Exploiting Basic Block Value Locality with Block Reuse

Value prediction at the instruction level has been introduced to allow more aggressive speculation and reuse than previous techniques. We investigate the input and output values of basic blocks and find that these values can be quite regular and predictable, suggesting that using compiler support to extend value prediction and reuse to a coarser granularity may have substantial performance bene...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000